Goto

Collaborating Authors

 internal medicine


Collaborative Medical Triage under Uncertainty: A Multi-Agent Dynamic Matching Approach

Cheng, Hongyan, Yu, Chengzhang, Shi, Yanshu, Wang, Chiyue, Liu, Cong, Jin, Zhanpeng

arXiv.org Artificial Intelligence

The post-pandemic surge in healthcare demand, coupled with critical nursing shortages, has placed unprecedented pressure on medical triage systems, necessitating innovative AI-driven solutions. We present a multi-agent interactive intelligent system for medical triage that addresses three fundamental challenges in current AI-based triage systems: inadequate medical specialization leading to misclassification, heterogeneous department structures across healthcare institutions, and inefficient detail-oriented questioning that impedes rapid triage decisions. Our system employs three specialized agents--RecipientAgent, InquirerAgent, and DepartmentAgent--that collaborate through Inquiry Guidance mechanism and Classification Guidance Mechanism to transform unstructured patient symptoms into accurate department recommendations. To ensure robust evaluation, we constructed a comprehensive Chinese medical triage dataset from "Ai Ai Yi Medical Network", comprising 3,360 real-world cases spanning 9 primary departments and 62 secondary departments. Experimental results demonstrate that our multi-agent system achieves 89.6% accuracy in primary department classification and 74.3% accuracy in secondary department classification after four rounds of patient interaction. The system's dynamic matching based guidance mechanisms enable efficient adaptation to diverse hospital configurations while maintaining high triage accuracy. We successfully developed this multi-agent triage system that not only adapts to organizational heterogeneity across healthcare institutions but also ensures clinically sound decision-making.


Zero-shot Performance of Generative AI in Brazilian Portuguese Medical Exam

Truyts, Cesar Augusto Madid, Rabelo, Amanda Gomes, de Souza, Gabriel Mesquita, Lages, Daniel Scaldaferri, Pereira, Adriano Jose, Flato, Uri Adrian Prync, Reis, Eduardo Pontes dos, Vieira, Joaquim Edson, Silveira, Paulo Sergio Panse, Junior, Edson Amaro

arXiv.org Artificial Intelligence

Artificial intelligence (AI) has shown the potential to revolutionize healthcare by improving diagnostic accuracy, optimizing workflows, and personalizing treatment plans. Large Language Models (LLMs) and Multimodal Large Language Models (MLLMs) have achieved notable advancements in natural language processing and medical applications. However, the evaluation of these models has focused predominantly on the English language, leading to potential biases in their performance across different languages. This study investigates the capability of six LLMs (GPT-4.0 Turbo, LLaMA-3-8B, LLaMA-3-70B, Mixtral 8x7B Instruct, Titan Text G1-Express, and Command R+) and four MLLMs (Claude-3.5-Sonnet, Claude-3-Opus, Claude-3-Sonnet, and Claude-3-Haiku) to answer questions written in Brazilian spoken portuguese from the medical residency entrance exam of the Hospital das Clínicas da Faculdade de Medicina da Universidade de São Paulo (HCFMUSP) - the largest health complex in South America. The performance of the models was benchmarked against human candidates, analyzing accuracy, processing time, and coherence of the generated explanations. The results show that while some models, particularly Claude-3.5-Sonnet and Claude-3-Opus, achieved accuracy levels comparable to human candidates, performance gaps persist, particularly in multimodal questions requiring image interpretation. Furthermore, the study highlights language disparities, emphasizing the need for further fine-tuning and data set augmentation for non-English medical AI applications. Our findings reinforce the importance of evaluating generative AI in various linguistic and clinical settings to ensure a fair and reliable deployment in healthcare. Future research should explore improved training methodologies, improved multimodal reasoning, and real-world clinical integration of AI-driven medical assistance.


Comparisons between a Large Language Model-based Real-Time Compound Diagnostic Medical AI Interface and Physicians for Common Internal Medicine Cases using Simulated Patients

Park, Hyungjun, Woo, Chang-Yun, Lim, Seungjo, Lim, Seunghwan, Kwak, Keunho, Jeong, Ju Young, Suh, Chong Hyun

arXiv.org Artificial Intelligence

Objective To develop an LLM based realtime compound diagnostic medical AI interface and performed a clinical trial comparing this interface and physicians for common internal medicine cases based on the United States Medical License Exam (USMLE) Step 2 Clinical Skill (CS) style exams. Methods A nonrandomized clinical trial was conducted on August 20, 2024. We recruited one general physician, two internal medicine residents (2nd and 3rd year), and five simulated patients. The clinical vignettes were adapted from the USMLE Step 2 CS style exams. We developed 10 representative internal medicine cases based on actual patients and included information available on initial diagnostic evaluation. Primary outcome was the accuracy of the first differential diagnosis. Repeatability was evaluated based on the proportion of agreement. Results The accuracy of the physicians' first differential diagnosis ranged from 50% to 70%, whereas the realtime compound diagnostic medical AI interface achieved an accuracy of 80%. The proportion of agreement for the first differential diagnosis was 0.7. The accuracy of the first and second differential diagnoses ranged from 70% to 90% for physicians, whereas the AI interface achieved an accuracy rate of 100%. The average time for the AI interface (557 sec) was 44.6% shorter than that of the physicians (1006 sec). The AI interface ($0.08) also reduced costs by 98.1% compared to the physicians' average ($4.2). Patient satisfaction scores ranged from 4.2 to 4.3 for care by physicians and were 3.9 for the AI interface Conclusion An LLM based realtime compound diagnostic medical AI interface demonstrated diagnostic accuracy and patient satisfaction comparable to those of a physician, while requiring less time and lower costs. These findings suggest that AI interfaces may have the potential to assist primary care consultations for common internal medicine cases.


Evaluation of Bias Towards Medical Professionals in Large Language Models

Chen, Xi, Xu, Yang, You, MingKe, Wang, Li, Liu, WeiZhi, Li, Jian

arXiv.org Artificial Intelligence

This study evaluates whether large language models (LLMs) exhibit biases towards medical professionals. Fictitious candidate resumes were created to control for identity factors while maintaining consistent qualifications. Three LLMs (GPT-4, Claude-3-haiku, and Mistral-Large) were tested using a standardized prompt to evaluate resumes for specific residency programs. Explicit bias was tested by changing gender and race information, while implicit bias was tested by changing names while hiding race and gender. Physician data from the Association of American Medical Colleges was used to compare with real-world demographics. 900,000 resumes were evaluated. All LLMs exhibited significant gender and racial biases across medical specialties. Gender preferences varied, favoring male candidates in surgery and orthopedics, while preferring females in dermatology, family medicine, obstetrics and gynecology, pediatrics, and psychiatry. Claude-3 and Mistral-Large generally favored Asian candidates, while GPT-4 preferred Black and Hispanic candidates in several specialties. Tests revealed strong preferences towards Hispanic females and Asian males in various specialties. Compared to real-world data, LLMs consistently chose higher proportions of female and underrepresented racial candidates than their actual representation in the medical workforce. GPT-4, Claude-3, and Mistral-Large showed significant gender and racial biases when evaluating medical professionals for residency selection. These findings highlight the potential for LLMs to perpetuate biases and compromise healthcare workforce diversity if used without proper bias mitigation strategies.


'The Last of Us' tells a new but familiar queer love story

Washington Post - Technology News

But however revolutionary their deaths might be for the universe of "The Last of Us," they still fall into well-worn gay death tropes. It seems that Bill is older than Frank, but Frank succumbs to an unspecified illness and ends up infirm, which ultimately prompts his suicide. If you grew up queer in the 80s and 90s, the image of one gay man pushing another in a wheelchair might look fiercely familiar from the early days of the AIDS crisis and the storytelling that came out of it. Many cis gay men of my generation believed this kind of death was inevitable, that they would die tended to by a lover or they would be the widower left behind. Bill rebels against this trope by dying alongside Frank, but as I watched (and cried) as Bill wheeled Frank around their house and handed him his pills, I thought of how many times I had seen this scene in other movies and television. I wondered why the show's creators chose to have Frank sicken to lead to Bill and Frank's deaths when one or both of their ages could have been the inciting factor.


How Data and Smart Technology Are Helping Hospitalists

#artificialintelligence

The increasing complexity of patient care, difficulties with time management, and managing administrative tasks while complying with regulations are a few overarching difficulties that go hand-in-hand with the job. Fortunately, big data and smart technology are helping hospitalists overcome these issues. Here are some fascinating ways data and smart technology are helping hospitalists. Medical billing is notoriously erroneous. Some estimates propose that upward of 80% of medical bills have errors.


The top 100 new technology innovations of 2022

#artificialintelligence

On a cloudy Christmas morning last year, a rocket carrying the most powerful space telescope ever built blasted off from a launchpad in French Guiana. After reaching its destination in space about a month later, the James Webb Space Telescope (JWST) began sending back sparkling presents to humanity--jaw-dropping images that are revealing our universe in stunning new ways. Every year since 1988, Popular Science has highlighted the innovations that make living on Earth even a tiny bit better. And this year--our 35th--has been remarkable, thanks to the successful deployment of the JWST, which earned our highest honor as the Innovation of the Year. But it's just one item out of the 100 stellar technological accomplishments our editors have selected to recognize. The list below represents months of research, testing, discussion, and debate. It celebrates exciting inventions that are improving our lives in ways both big and small. These technologies and discoveries are teaching us about the ...


This AI tool predicts whether COVID patients will live or die

#artificialintelligence

A tool has been developed to help healthcare professionals identify hospitalised patients most at risk of dying from COVID-19 using artificial intelligence (AI). The algorithm could help doctors to direct critical care resources to those in most immediate need, which the developers of the AI tool say could be especially valuable to resource-limited countries. And with no end in sight for the coronavirus pandemic, with new variants leading to fresh waves of sickness and hospitalisation, the scientists behind the tool say there is a need for generalised tools like this which can be easily rolled out. To develop the tool, scientists used biochemical data from routine blood samples taken from nearly 30,000 patients hospitalised in over 150 hospitals in Spain, the US, Honduras, Bolivia and Argentina between March 2020 and February 2022. Taking blood from so many patients meant the team were able to capture data from people with different immune statuses – vaccinated, unvaccinated and those with natural immunity – and from people infected with every variant of COVID-19.


Bayesian Kernelised Test of (In)dependence with Mixed-type Variables

Benavoli, Alessio, de Campos, Cassio

arXiv.org Machine Learning

A fundamental task in AI is to assess (in)dependence between mixed-type variables (text, image, sound). We propose a Bayesian kernelised correlation test of (in)dependence using a Dirichlet process model. The new measure of (in)dependence allows us to answer some fundamental questions: Based on data, are (mixed-type) variables independent? How likely is dependence/independence to hold? How high is the probability that two mixed-type variables are more than just weakly dependent? We theoretically show the properties of the approach, as well as algorithms for fast computation with it. We empirically demonstrate the effectiveness of the proposed method by analysing its performance and by comparing it with other frequentist and Bayesian approaches on a range of datasets and tasks with mixed-type variables.


4 Tips to Improve Your Statistical Literacy

#artificialintelligence

Statistical literacy (assessing statistical statements, arguments and associations) is extremely important for producing and interpreting results from data analysis, yet it usually isn't a part of mainstream statistics education [1]. From the correlation-causation error to immortal time bias, there are many ways to invalidate your results. You can lessen the odds by following a few good practices. When you design your analysis, make sure you're asking the right question. This isn't always easy, as the German Federal Ministry of the Interior, Building and Home Affairs found out after publishing a 2018 press release concerning the "successful" use of facial recognition technology at train stations [2].